RED.LM5[LSP,JRA] - www.SailDart.org

perm filename RED.LM5[LSP,JRA] blob sn#230414 filedate 1976-08-05 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00004 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.ss(Completeness  results)
C00009 00003	For terms without normal forms, the situation is not so straightforward.
C00017 00004	
C00039 ENDMK
C⊗;
.ss(Completeness  results)
.group;
Since we shall always have in mind the fixed interpretation given by πV and 
$=, and we have shown that this provides a model for the ?λ-calculus, it
is convenient to simplify notation:

.begin indent 5,7;
1) by allowing the terms themselves to stand for their values in $D, and

2) by identifying elements of $D with their images in [$D→$D] under
$f (and correspondingly identifying functions in [$D→$D] with their
images in $D under $y).
.apart;
.end
This use of the terms themselves to denote their values in $D is quite precise
if we say that:
.begin indent 5,7;
1) free variables are interpreted as ranging over $D.

2) when we write an abstraction ∪λx.M∩, we understand that ∪x∩ is restricted
to ranging over $D, and by the abstraction we really mean the image of the corresponding
function under $y, and

3) when a term occurs as the rator of a combination, we really mean its image under
$f.
.end
By virtue of Theorem 8, we may use ordinary ∪λ∩-conversion to manipulate
these (expressions denoting) elements of $D. Whenever we do need to be careful
about distinguishing between terms and their values, or between elements of
$D and their images in [$D→$D], we can fall back on the formal
notation and write in the appropriate πV, πr, $f, $y, etc., but most of the time
we can tolerate the ambiguity.

So far all our results have asserted equality of terms in $D.  A model is not very
interesting if all terms are equal, thus we might ask whether there are terms
which are not equal.  First, $D is a complete lattice, and just as lattice
equality determined the relation $= between terms, so π≤ induces a partial
order relation between terms:
.begin center;
?Mπ≤?N iff $V?M|(πr)π≤$V?N|(πr) for all πrε$N ⊗↓We use the same symbol "π≤"
for the induced relation.←
.end
This partial order is substitutive, that is, ?Mπ≤?N => ∪C[M]π≤∪C[N]∩
 for all contexts ∪C[ ].


%7LEMMA 9%1
.begin nofill;
	a)  The terms ?K and ∪KI∩ are incomparable. ⊗↓?K and ?I are defined {yon(R30)}.←
	b)  The terms ?I and ?K are incomparable.
Proof:
.fill; indent 10,10;
a)  Suppose ∪Kπ≤∪KI∩.  Then ∪x = Kxy π≤ ∪KIxy = y ∩holds for all ∪x,yε$D, so all
elements of $D would be equal, contradicting our assumption that $D contains
more than one element.  Hence ∪Kπ≥∪KI∩. Similarly, ∪KIπ≥∪K∩.

b)  Similarly to the proof of a) from the observation that ∪I#I(Kx)y=x∩,
while ∪KI(Kx)y=y∩.

.end
.indent 0,0;

So the models are non-trivial.  For terms having a normal form we can prove
the converse of Theorem 8, that is, that equality in the models implies
convertibility, with the aid of a theorem of B⊗ohm.  First, we make the
definition that two terms ?M and ?N are said to be %3⊗→separable↔←%1 if there
is a context ∪C[ ]∩ such that

.begin center;
∪C[M] $c ∪I∩, and ∪C[N] $c ?K.
.end

%7THEOREM 10%1 (⊗→B%5o%1hm's theorem↔←)####If ?M and ?N have distinct (not ?α-convertible)
normal forms, then ?M and ?N are separable.

We will not prove it here.⊗↓The original reference, in Italian, is:
B⊗ohm, C., "Alcune proprieta delle forme ?β-?h-normali nel ?λ-k-calcols",
Consiglio Nazionalle delle Richerche, No. 696, Rome, 1968.←
It is worth mentioning that:

.begin indent 15,10;
1)  The proof gives an effective procedure for finding the separating context, and

2)  For closed terms, the context which separates them is of the form
∪[#]?X1K####k>0,#?Xi#closed#0<i≤k.
.end

For distinct normal forms, this theorem is a sharpening of the Church-Rosser
Theorem, which showed that distinct normal forms cannot be proved equal by
the conversion rules; B⊗ohm's Theorem says that if we ever postulated,
as an extra axiom for the ?λ-calculus, the equality of two distinct normal
forms, we would have an inconsistent system. 

%7Corollary 10.1%1  If ?M and ?N have distinct normal forms, then ?M and
?N are incomparable in $D.
.begin indent 0,10;
Proof:######Let ∪C[#]∩ be the context determined by Theorem 10.  If
?Mπ≤?N, then ∪I#$=#∪C[M]#π≤#∪C[N]#$=#?K, contradicting Lemma 9 b).

.end
 So the conversion rules are complete for terms having normal forms.
That is, for all terms ?M, ?N having normal forms, ?M#=#?N is provable iff
?M#$=#?N iff ?M and ?N both $r to the same (up to ?α-$c) normal form, by
normal order reduction.

For terms without normal forms, the situation is not so straightforward.
In particular, corollary 10.1 does not hold if either ?M or ?N fails to
have a normal form.  But everything we have said so far is true for all lattice 
models of the ?λ-calculus; the properties of terms without normal forms
are dependent on the deeper structure of particular models and we have yet
to sort out the "natural ordering" of these terms from a computational
point of view.

Before we consider the particular models discovered by Scott in 1969, it is 
worth noting that we cannot expect, nor would we want, completeness of the
conversion rules for all equations between terms; in particular, there are 
examples, such as the unsolvable terms, which are not interconvertible but
do behave alike from a computational point of view.  For Scott's original
models there is a kind of "limiting completeness" of the rules of conversion
which does hold.  This leads to a characterization of the relation $= (and
also of the relation π≤) between terms which has an obvious and natural
counterpart for programming languages.

The following theorems are proved in Wadsworth [15], using laws obtained from
the properties of the construction of Scott's models.

%7THEOREM 11%1
.begin nofill;
	a)  For ⊗d≡∪λx.xx∩,  ⊗d(⊗d) =⊗b
	b)  For ∪Y%4λ∪≡λf.(λx.f(xx))(λx.f(xx)), Y%4λ%1 =πY
	  where πY is the lattice-theoretic least fixed point operator.
	c)  For ∪J≡Y%4λ∪F∩ with ∪F≡λf.λx.λy.x(fy), J =?I.
.end

We have made use of the ambiguity between terms and their values in $D in stating
the theorem, for example, in part a) we really mean that $V⊗d(⊗d)|(πr)#=#⊗b
for all πr.  And in part b) we might explain that we know from lattice theory
that the function

.begin center;turnoff "{","}";
πY : [$D →$D] → $D  defined by
πY(πf) = πL{πf%8n%1}
.end

maps continuous functions to their least fixed points.  Moreover, πY itself also 
is continuous so, by virtue of the isomorphism $D=[$D→$D], we can think of
πY as being an element of $D; more precisely, we can identify πY as the element
πY%4∞%1ε$D given by

.begin center;
πY%4∞∪ ≡ $y(?λπfε$D.πY($f(πf))) = $y(πY⊗x$f)
.end

Then, properly stated, b) reads: for all πr, $V?Y%4λ%1|(πr) = πY%4∞%1.

%7THEOREM 12%1##?I is the only fixed point of ∪F≡λf.λx.λy.x(fy)∩ in $D.

%7LEMMA 13%1  For ∪G≡λy.λf.f(yf)∩ we have πY∪(G)  = πY  = ∪G(πY)

For proofs of the above see Wadsworth [15].

.begin nofill;group;
%7THEOREM 14%1
			Let  ?Y%80%1 ≡ ?Y%4λ
			     ?Y%8i+1%1 ≡ ?Y%8i%1∪(G)∩  i≥0,
			Then, for all i≥0, ?Y%8i%1  = πY
Proof:
.begin indent 10,10;
The case i=0 is just Theorem 11b).  
If, inductively, ⊗→%dY%8i%1↔← = πY, then ?Y%8i+1%1 ≡ ?Y%8i∪(G) = πY∪(G) = πY, by Lemma 13.

.end
.end

On the other hand, we know that ?Y%80%1 and ?Y%81%1 cannot be proved equal by
means of the conversion rules.  This was the first example, discovered by David
Park, of the incompleteness of the conversion rules for Scott's models.  The
equality of all these ?Y%8i%1's is quite a desirable property from a computational
point of view and, formally, the properties of the construction provide a proof
that this is the case in Scott's models.  Further motivation for wishing to 
regard the ?Y%8i%1's as equivalent will be seen via the concepts of "approximate
reduction" and "approximate normal form" yet to be discussed.

That ?I#$=∪J∩ is rather more difficult to motivate for it gives an example of
a term with a normal form which is equal to one which does not have a normal
form.  This phenomenon is not limited to the particular normal form ?I.

%7Corollary 11.1%1  For any normal form ?N, there is a ?λ-term ?X with no normal
form such that ?N#$=#?X.
.begin indent 0,10;
Proof:##### Choose an occurrence of a variable ?x in ?N which is not the rator
of a combination.  (Such an occurrence always exists; for example, choose the
rightmost occurrence of a variable in ?N.)  Let ?X be the ?λ-term obtained by 
replacing this occurrence of ?x by the equivalent ?λ-term ?Y%4λ∪(F)(x)
(≡#J(x)#$=#∪I(x)#=#x)∩.  Then ?N$=?X, and the replaced occurrence of ?x is
sufficient to guarantee that ?X has no normal form.

.end

Surprising though this result may seem at first, one can make a strong case
for wishing to regard ?I and ?J as equivalent whenever one accepts 
?h-conversion as an axiom.


We have seen examples of ?λ-terms which are equal in Scott's models but
cannot be proved so by the theory of conversion.  Moreover we have indicated
that this is neither surprising nor undesirable, the reason being, roughly,
that equations between terms are functional equations which tell you an
infinite number of facts - specifically, that for any argument the left
hand function applied to that argument gives the same result as the right
hand function applied to that argument - and, in general, it is impossible
(because of G⊗odel's incompleteness results) to effectively axiomatize all
true equations of such a functional kind.

Nevertheless, although the theory is too weak to prove all true equations
between terms in $D, all implementations of the ?λ-calculus %2are%1 based
on the reduction rules.  In an implementation we are not so much concerned
with proving terms equal as with computing out the "values" (say, normal
forms) of terms by successive reduction.  (In practice, the reductions are
often simulated by representing terms and other intermediate items of interest
in certain kinds of data structures - stacks, symbol tables, pointers, etc. -
but conceptually, the computations performed on these structures are meant
to represent the process of reduction.)  So if a model is to be of any
relevance to computational reality, it should have some connection with the
process of reduction - there should be some connection between our "semantics" for
the ?λ-calculus, given via $D and the function πV, and the "means of computation,"
given by the reduction rules.

To make a connection, we extend the ordinary (syntactical) theory of the
?λ-calculus to allow %3partial terms%1 and %3partial%1, or %3approximate,
reductions%1, in much the same way that Scott's more general (semantical)
theory of computation extends the conventional notion of data type to allow
partial (and infinite) objects and approximations.  Then we can discuss %3limits%1
of better and better approximate reductions and tie these in with the notion of
limit we already have in the lattice models.

For the ?λ-calculus an appropriate limiting process is based on ?β-reduction.  Suppose
we wish to "evaluate" a term ?M and we start reducing ?M (in some order).  If
?M has a normal form and the reduction actually reaches this normal form,
everything is fine and we take the normal form obtained as the "value" of
?M.  If not, conventionally, the possibility is not considered of attributing
a "value" to a reduction which (apparently) fails to terminate; one simply
concludes that the initial term ?M does not appear to have a normal form.  This
view is unnecessarily pessimistic for there %2is%1 something one can say
based on the intermediate terms in the reductions.

For instance, suppose ?M is some term and we have reduced it to a term ?M1, say
.begin center;
?M1∪ ≡ λx.λy.y(x((λz.P)Q))(xy)((λw.R)S)∩
where ∪P, Q, R, S∩ are terms.
.end
At this stage we cannot tell whether ?M has a normal form or not, because there
are still at least two redexes in ?M1.  However, from the form of ?M1, we can
say that if ?M1 is going to have a normal form, this must be of the form
.begin center;
∪λx.λy.y(x( ? ))(xy)( ? )∩
.end
because any further ?β-reduction of ?M1 can affect only the parts where we
have written "%d?%1".  We adjoin to the ?λ-calculus the special constant
symbol "⊗→%gW%1↔←" to stand for such, as yet, "undetermined" parts of normal forms.  Then
we shall say that
.begin center;
?A1∪ ≡ λx.λy.y(x(⊗O∪))(xy)(⊗O∪)∩
.end
is a %3⊗→partial normal form↔← of ?M, partial because it tells us something
of the structure of the normal form of ?M (if it exists), but does not give
complete information.  The obvious interpretation of ⊗O in the lattice models
is as the least element ⊗b, thus we add the clause

.begin center;

4)  $V⊗O|(πr) = ⊗b
.end

to the definition of πV.  Then, under the induced partial ordering between
terms, ⊗O approximates every term, i.e., ⊗Oπ≤?M for all terms ?M, and the
substitutivity property gives ?A1π≤?M1, whence also ?A1π≤?M (since 
reductions preserve values in $D).  So the partial normal form ?A1 is
certainly an approximation to ?M, and we shall now call it an %3⊗→approximate
normal form↔← of ?M.

What we have done is considered some reduction of ?M to a term ?M1 and then
replaced any remaining ?β-redexes in ?M1 by ⊗O to obtain an approximate
normal form ?A1.  Clearly this idea can be applied to all reductions of ?M
and in this way we obtain a whole set of approximate normal forms; in $D, their 
limit is equal to the value of the starting term ?M.  So although the
reductions of a term may not succeed in "evaluating" it in finite time, in
a certain sense they do "converge" to its value in the limit.  To formalize
these ideas and state the theorem we first introduce some more 
terminology.

The system which allows ⊗O to occur as a subterm will be called the 
⊗→%glW%1-%3calculus%1↔←.  Terms of this calculus will be called %3⊗→approximate terms%1↔←,
or ⊗→%glW%1-%3terms%1↔←, those of the usual ?λ-calculus, ?λ-terms, or just 
terms.  Notice that from the interpretation of ⊗O and the properties of ⊗b,
it is immediate that two ⊗→%gW%1-%3conversion rules↔←%1 are valid (and, as usual,
replacements from left to right will be called ⊗O-%3reductions%1):
.begin nofill; indent 15,15;

(⊗O%41%1)     ⊗O?M = ⊗O
(⊗O%42%1)     ∪λx.⊗O = ⊗O

.end

A ?λ⊗O-term is said to be in %3⊗→approximate normal form↔←%1 if it does not
contain a ?β-redex, and in ⊗→%gW%1-%3normal form%1↔← if it is not ⊗O-reducible.  An
approximate normal form which does not contain any occurrences of ⊗O may
be called a %3⊗→proper normal form%1↔←.  

A ?λ⊗O-term ?A is said to be an ⊗O-%3match of ?M if ?A and ?M are identical
except at subterms which are occurrences of ⊗O in ?A.  ?A is said to be
a %3⊗→direct approximant↔← of ?M if ?A is an ⊗→%gW%1-match↔← of ?M and is in approximate
normal form, and is said to be the %3⊗→best direct approximant↔← of ?M if ⊗O
occurs in ?A only at the subterms corresponding to the  ?β-redexes
in ?M.  (Clearly, every direct approximant is π≤ the best one.)  ?A is said
to be an %3approximate normal form of ?M if ?A is a direct approximant of a
term to which ?M is reducible; if also ?A is in ⊗O-normal form, ?A is said
to be a %3⊗→reduced approximant↔← of ?M.

For a term ?M, we shall write ⊗→%6A%d(M)%1↔← for the set of all its approximate
normal forms.  It is definitely necessary to consider "all" approximate
normal forms, not just those obtained using a particular order of reduction.  Before
stating the limit theorem about the sets ⊗A(?M), we note the following lemma.

%7LEMMA 15%1   If ?M0 ?β-$r ?M1 and if ?A0, ?A1 are the best direct approximants 
of ?M0, ?M1, respectively, then ?A0π≤?A1.  (The process of taking best direct
approximants is monotonic with respect to ?β-reduction.)

The proof of this lemma is not difficult, but it is very long and tedious.  It
is given in Appendix 1 to chapter 6 of Wadsworth [15].

%7Corollary 15.1%1  For any term ?M, the set ⊗A(?M) is directed.
.begin indent 0,10;
Proof:#####Since every direct approximant of a term  π≤ the best direct
approximant, it suffices to consider the subset of ⊗A(?M) consisting of the
best direct approximants of terms to which ?M is reducible.  Suppose ?A1, ?A2,
are two such members of ⊗A(?M), and let ?M1, ?M2 be the terms to which ?M is
reducible for which ?A1, ?A2 are the best direct approximants.  Then 
?M1#$c#?M2, so by the Church-Rosser theorem there is a term ?M3 such that
?M1#$r#?M3 and ?M2#$r#?M3.  Let ?A3 be the best direct approximant of ?M3.  Then
?A3ε⊗A(?M) and, by Lemma 15, ?A1π≤?A3 and ?A2π≤?A3.  Since ⊗A(?M) is clearly
non-empty (⊗Oε⊗A(?M) always), this implies ⊗A(?M) is directed.

.end

The main limit theorem can now be stated as follows, where we write in the
πV and πr to emphasize the distinction between syntactic and semantic concepts.

.group
%7THEOREM 16%1 For all terms ?M and environments πr, 
.begin center;turnoff "{", "}";
$V?M|(πr) = πl{$V?A|(πr) : ?Aε⊗A(?M)}
(All terms are the limit of their approximate normal forms.)
.end
.apart

This theorem (for proof see Wadsworth [15]) is the proper formulation of our
comment about the "limiting completeness" of the reduction rules.  The 
reduction rules are adequate for generating the set of terms to which ?M
reduces and as these are generated we obtain more and more of the approximate
normal forms of ?M.  Thus the theorem asserts that if we are willing to
generate enough of the terms to which ?M reduces, then we can approximate
(the value of) ?M as closely as we like - intuitively, we can "compute" the
values of terms in $D by listing (the values of) their approximate normal forms.

In another sense, the theorem can be regarded as asserting that %2every%1
term has a "normal form", it's just that this may be an infinite 
expression.  Several methods have been suggested for incorporating infinite
normal forms.  When done in the right way it should be possible to say that
two terms have the same value in $D iff they have the same infinite normal
form.

As one immediate consequence of Theorem 16, we can now give a complete
characterization of those terms which have value ⊗b (for all enviroments πr)
in the models.

.group
%7THEOREM 17%1  The following three conditions are equivalent:
.begin nofill;indent 15,15;
1)  ?M $= ⊗O
2)  ?M is unsolvable
3)  ?M does not have a head normal form.
.end
.apart
.begin indent 0,10;
Proof:#####We give the proof for closed terms ?M. (The extension to terms
with free variables - in which case, for solvability, one must consider
substitutions of closed terms for the variables - is straightforward, but
messier.)  We prove 1) => 2) => 3) => 1).
.begin indent 10,15; nofill;
.group
1) =>2)
Suppose ?M$=⊗O but ?M is solvable, then there exist terms ?N1k (k≥0) such that
	?M?N1K $c ?I
But then,  ?I $= ?M?N1K $= ⊗O?N1K $= ⊗O  (by ⊗O%41%1)
which is a contradiction.
.apart
.group

2) =>3)
Suppose ?M has a head normal form
.begin center;
∪H ≡ λ?x1N∪.z?X1M  ∩n,m≥0
.end
where ∪z ≡ ?xi for some i (since ?M is closed).  Let
			?I        if j≠i
		?N%4j ∩≡
			?K%8m?I   if j=i
Then ?M?N1N $r ∪H?N1N $r ?I,  so ?M is solvable; a contradiction.
.apart
.group

3) => 1) 
.fill;indent 10,10;
Suppose ?M has no head normal form. Then no term ?M' to which ?M reduces is %2in∩
head normal form.  When ?M' is not in head normal form, all its direct approximates
are of the form
.begin center; 
?A' ∪≡ λ?x1N∪.⊗O?X1M
.end
But ?A' $= ⊗O  by the rules (⊗O%41∩) and (⊗O%42∩).  Hence all the approximate
normal forms of ?M are $=#⊗O, so ?M#$=#⊗O  by Theorem 16.
.end
.end

We can find additional support for wishing Theorems 16 and 17 to hold in any
"reasonable" model for the ?λ-calculus.  This derives from several purely formal
results interrelating terms and their approximate normal forms as regards the
"useful" results they can produce when used as parts of larger expressions.

.begin nofill;

%7THEOREM 18%1   For all terms ?M and contexts ∪C[ ]∩,
.begin fill;indent 15,15;
	a) ∪C[M] ∩has a normal form iff there exists ?Aε⊗A(?M), such that
		∪C[A]∩ has the same normal form.

	b) ∪C[M]∩ has a head normal form iff there exists ?Aε⊗A(?M), such that
		∪C[A]∩ has a similar head normal form.
.end
Proof:
.begin indent 10,10; fill;
Part a) is proved by syntactic arguments based on the relative lengths of the
normal order reductions of terms and their ⊗O-matches.  The proof is given in full
in Wadsworth [15].  We will prove only part b) which can be accomplished more
easily by using Theorems 16 and 17.
.end
.begin indent 10,10; tabit2 (15,45); turnoff "{", "}";
.group
=> :
\∪C[M]∩ has a h.n.f. => ∪C[M] $≠ ⊗O\by Theorem 17
\=> πl{∪C[A] : A∩ε⊗A(?M)} $≠ ⊗O\by Theorem 16 and continuity
\=> ∃ ?Aε⊗A(?M), ∪C[A] $≠ ⊗O\otherwise ∪C[M] $= ⊗O
\=> ∃ ?Aε⊗A(?M), ∪C[A]∩ has a h.n.f.\by Theorem 17
.apart
.group

<= :
\Suppose ?Aε⊗A(?M), so certainly ?Aπ≤?M.  Then
\∪C[A]∩ has a h.n.f. => ∪C[A]$≠⊗O\by Theorem 17
\=> ∪C[M] $≠ ⊗O\by ?Aπ≤?M 
\=> ∪C[M]∩ has a h.n.f.\by Theorem 17

.end
.fill;indent 10,10;
For both cases the h.n.f.'s involved must be similar, for it is easily shown
that dissimilar h.n.f.'s are incomparable under π≤.
.end

.begin nofill;

%7Corollary 18.1∩  Suppose ?U is unsolvable and ∪C[ ]∩ is any context.  Then
		∪C[U]∩ has a normal form (h.n.f.) 
		  iff ∪C[⊗O∪]∩ has the same normal form (similar h.n.f.)
		  iff for all terms ?M, ∪C[M]∩ has the same normal form (similar h.n.f.)
.end

Intuitively, corollary 18.1 shows why unsolvable terms should be regarded as
being "least-defined".  Theorem 18 shows that all the information about how a
term may be used to give a "useful" result is contained in its set of approximate
normal forms.  Or, in other words, the "interesting" behaviour of a term is
determined by its approximate normal forms, which is why we expect Theorem 16 to
hold in any "reasonable" model.

In fact, Theorem 16 can be extended to apply also to the approximate normal forms
of ∪C[M]∩.  It can be proved that, for all terms ?M and contexts ∪C[#]∩,
.begin center;turn off "{","}";
%d∀?A'ε⊗A(∪C[M]∩), ∃?Aε⊗A(?M), such that ?A'ε⊗A(∪C[A]∩)
or, equivalently,
⊗A(∪C[M]∩) = πl{⊗A(∪C[A]∩): where ?Aε⊗A(?M)}
.end

In the case of functional application, taking ∪C[#]#≡#F[#]∩, this specializes to
.begin center;
%d∀?A'ε⊗A(∪FM∩), %d∃?Aε⊗A(?M), such that ?A'ε⊗A(∪FA∩)
.end
That is, to obtain an approximate normal form of a function application ∪FM∩ all
we need is an approximate normal form of the argument ?M.  This is just a ?λ-calculus
analogue of the ideas motivating the continuity of functions on lattices.